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Abstract 

Folksonomy is an emerging technology that works to classify the in- 
formation over WWW through tagging the bookmarks, photos or other 
web-based contents. It is understood to be organized by every user while 
not limited to the authors of the contents and the professional editors. 
This study surveyed the folksonomy as a complex network. The result 
indicates that the network, which is composed of the tags from the folk- 
sonomy, displays both properties of small world and scale free. However, 
the statistics only shows a local and static slice of the vast body of folk- 
sonomy which is still evolving. 
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1 Introduction 

1.1 Folksonomy and Tags 

The etymology of the word Folksonomy shows that it's a portmanteau of the 
words folks and taxonomy coined by Thomas Vander Waipp, which implies that 
it could be understood as an organization by folks, especially of the contents 
over the world wide web. Being different from the traditional approaches to the 
classification, the classifiers in folksonomy are not the dedicated professionals, 
and Thomas Vander Wal described this as a "bottom-up social classification" 0]. 
Adam Mathes explains folksonomy that users of the documents and media create 
metadata - data about data - for their own individual use that is also shared 
throughout a community 2 . 

Del.icio.us (http : //del . icio .us), Furl (http : //www. furl .net) and Flickr 
(http://www.flickr.com) are three most popular folksonomies. Their users 
describe and organize the content (bookmarks, webpages or photos) with their 
own vocabulary and assign one or more keywords, namely tags, to each single 
unit of content. The folksonomy is thus implemented through the tags assigned. 
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Therefore, tags are now the mainstream approach to the application of folkson- 
omy, and folksonomy is currently often understood as tagging. 

1.2 Folksonomy as Network 

As was mentioned above, folksonomy enables users to share their individual use 
of tags in the community. Users share various contents under one same tag, or 
share different tags assigned to one piece of content. Thus tags are linked to each 
other and so arc the contents. Such a feature makes it possible to understand 
the folksonomy a network of tags or contents. 

Besides the network of folksonomy, some similar networks were reported to 
display the properties of small world or scale free. [8; It is possible to measure 
the graph properties of World Wide Web in order to quantify the information 
therein and give out an the explanation or its evolution. 5 In 2001 Ferrer i 
Cancho and Sole defined a network in English language. Another study by Yook, 
Jeong and Barabasi constructed a network based on the synonym according 
to Merriam- Webster Dictionary. They observed a small average path length 
clustering coefficient and power-law degree distribution [HI, and indicated that 
language also forms a complex network in some respects. Rosa Gil et al.® model 
and analyze the semantic web as complex system. 

In the light of these works and results, the network of folksonomy can be 
defined and constructed. While comparing this network with that modeled by 
Rosa Gil et o/.|S], the difference lied mainly in the difference between the tags 
and the ontologies in the DAML Ontology Library. 

2 Properties of Folksonomy Network 

In order to learn the conformation of the folksonomy network realized through 
tags, to see whether it displays such properties of small world or scale free and 
to measure the folksonomy, the model of the network must be defined first. 
Folksonomy can be considered as a graph where nodes represent the tags and 
different tags assigned to one piece of content are linked by edges. This graph 
is an undirected graph. Regardless of multiple contents covering two tags, the 
graph is not a weighted graph. 

Degree distribution For an selected node i in the folksonomy network, its 
degree ki represents the number of tags which share at least one piece of 
content with the tag (or node) i. For each network, the spread in node 
degree follows a distribution function P(k). For scale free networks, the 
degree is in a power-law distribution 

P(k) ~ k- 1 . 

Clustering coefficient For node i with the degree ki, it is connected with k% 
nodes in the network. There are £7, edges in this subgraph of ki size, and 
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could be at most Cj?. = \ki{ki — 1) edges between these kt nodes. The 
ratio 

2Ej 

1 Hh - 1) 

is the clustering coefficient of the node i. The clustering coefficient Ci 
measures the interrelatedness of i's neighbors. 

Average path length For two nodes i,j in the same connected component, 
lij is the minimum length of path between them. The average path length 
I is the average value of all Uj . 



3 Experiment 

The data set of the experiment is based on the records of the bookmarks sub- 
mitted to Del.icio.us during 26 Mar. to 27 Mar., 2005. Del.icio.us provides the 
service that enables users to categorize their bookmarks or links with tags. 

All the data used in this experiment is available through the subscription 
of RSS feed of Del.icio.us (http://del.icio.us/rss). For each entry of the 
bookmarks, only the information of the URL, the time of submission and the 
tags were recorded. Other information as the creator, the title was ignored in 
the experiment. 

For every distinct URL, all the tags attributed to it will be linked to each 
other with edges. The network is thus constructed. 



3.1 Folksonomy as a Small World Network 

Random networks were first defined by P. Erdos and A. Renyi in 1959. In such 
a random network of Erdos-Renyi model, the average path length l ra ndom is 
small with regard to the size N of the network, 

\nN 

"random 



ln(fc) 

and its clustering coefficient 

r 

^random — iy • 

The small world network of Watt-Strogatz displays HU^O], as the random 
network with the same N and (k), the similar property of small average path 
length 

I — lrandom 

however with a relatively high clustering coefficient 

C C ran dom- 

The properties of the network of folksonomy tags in experiment turns out 
as follows. 
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Nodes (the number of tags) N: 9804 



• Average node degree (k): 11.0 

• Clustering coefficient C: 0.06 

• Average path length I: 3.40 

For the network in the experiment, its average path length I — 3.40 is 
approximately the length l ran dom — 3.83 of the corresponding random net- 
work. And its clustering coefficient C = 0.06 is much larger than the prediction 
Crandom — 0.001 if the network is random. Therefore It can be concluded to be 
an small world network. 

3.2 Folksonomy as a Scale Free Network 

Lots of real networks are reported to be scale /ree|8], i.e. its degree distribution 
P(k) is in power-law 

P(k) ~ fc~ 7 . 

While in Erdos and Renyi's theory, the degree distribution P(k) of a random 
network will follow Poisson distribution. 

The property of scale free can be detected in the folksonomy network. Figure 
1 indicates the distribution is linear in logarithmic scale, as well as its Comple- 
mentary Cumulative Distribution Function, CCDF, in Figure 2. The result from 
the folksonomy network (see Figure 1) shows its degree distribution decays at 
the rate of fc -7 , where the power-law exponent 7 is 1.418. 

Table 1 is a top-20 list of tags involved in experiment with the most degree 
in the network, namely, those have the most contacts with the other tags. 

4 Conclusion and Future Work 

The experiment samples a part of the folksonomy at Del.icio.us, which demon- 
strated above that the folksonomy as a network formed by tags displays both 
nature of small world and scale free. 

However the folksonomy network is said to be small world and scale free as 
local properties. The body of folksonomy is much larger than this fragment. 
All tags over WWW indexed by Technorati are more than 1 million^J. It 
is possible that the panorama of folksonomy and the parameters of the whole 
network would differ from the present local ones. 

Since users and authors over WWW submits their contents to folksonomy 
every minute, the network of folksonomy evolves over time. The work in the 
experiment surveyed the static properties of a folksonomy network, but the 
network is dynamically increasing every moment. The study of dynamics on 
the complex networks will be applied to the further analysis of folksonomy's 
structure, behavior including its forming mechanism. Since folksonomy is a 
classification system of web contents, its properties both static and dynamic 
can also serve to search and retrieve the related information. 
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Figure 1: Degree distribution of folksonomy network. In logarithmic scale. R 
is the correlation coefficient. 
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Figure 2: Degree Complementary Cumulative Distribution Function, CCDF. In 
logarithmic scale. R is the correlation coefficient. 
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Tabic 1: Top 20 degree tags 
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